Skip to content

Conversation

@ivarflakstad
Copy link
Member

@ivarflakstad ivarflakstad commented Nov 20, 2025

Set math mode via old api if macos version < 15 or ios < 18.

Fixes #3185

@ivarflakstad ivarflakstad marked this pull request as ready for review November 20, 2025 22:07
@jacobgorm
Copy link

jacobgorm commented Nov 21, 2025

My build now completes on older MacOS, which is awesome :)

However, It seems to me that the fast-math flag check should be moved to build-time, as having to set an envvar in production is cumbersome and error-prone. I've asked Codex and it it suggests these two approaches, first is IMO cleanest via a feature flag:

diff --git a/candle-metal-kernels/Cargo.toml b/candle-metal-kernels/Cargo.toml
index 8ce09a32..c616d87d 100644
--- a/candle-metal-kernels/Cargo.toml
+++ b/candle-metal-kernels/Cargo.toml
@@ -9,6 +9,9 @@ keywords = ["blas", "tensor", "machine-learning"]
 categories = ["science"]
 license = "MIT OR Apache-2.0"
 
+[features]
+default = ["fast-math"]
+fast-math = []
 
 [dependencies]
 half = { version = "2.5.0", features = [
diff --git a/candle-metal-kernels/src/kernel.rs b/candle-metal-kernels/src/kernel.rs
index f941e302..bebdd187 100644
--- a/candle-metal-kernels/src/kernel.rs
+++ b/candle-metal-kernels/src/kernel.rs
@@ -2,7 +2,6 @@ use crate::source::{
     AFFINE, BINARY, CAST, CONV, FILL, INDEXING, MLX_GEMM, MLX_SORT, QUANTIZED, RANDOM, REDUCE,
     SDPA, SORT, TERNARY, UNARY,
 };
-use crate::utils::get_env_bool;
 use crate::{
     ComputePipeline, ConstantValues, Device, Function, Library, MTLCompileOptions,
     MTLMathFloatingPointFunctions, MTLMathMode, MetalKernelError, Source,
@@ -182,7 +181,7 @@ fn get_compile_options() -> Retained<MTLCompileOptions> {
     let compile_options = MTLCompileOptions::new();
     //unsafe { compile_options.setEnableLogging(true) };
 
-    let fast_math_enabled = get_env_bool("CANDLE_METAL_ENABLE_FAST_MATH", true);
+    let fast_math_enabled = cfg!(feature = "fast-math");
     // Ref availability:
     // https://developer.apple.com/documentation/metal/mtlcompileoptions/mathmode
     if available!(macos = 15, ios = 18) {
diff --git a/candle-metal-kernels/src/utils.rs b/candle-metal-kernels/src/utils.rs
index 1ad647d7..8cf4006a 100644
--- a/candle-metal-kernels/src/utils.rs
+++ b/candle-metal-kernels/src/utils.rs
@@ -1,6 +1,5 @@
 use crate::metal::{Buffer, CommandBuffer, ComputeCommandEncoder, ComputePipeline};
 use crate::MTLSize;
-use std::ffi::OsStr;
 use std::ops::Deref;
 use std::sync::{RwLockReadGuard, RwLockWriteGuard};
 
@@ -236,15 +235,3 @@ impl<'a, T> From<RwLockWriteGuard<'a, T>> for RwLockGuard<'a, T> {
     fn from(g: RwLockWriteGuard<'a, T>) -> Self {
         RwLockGuard::Write(g)
     }
-}
-
-fn is_truthy(s: String) -> bool {
-    match s.as_str() {
-        "true" | "t" | "yes" | "y" | "1" => true,
-        _ => false,
-    }
-}
-
-pub(crate) fn get_env_bool<K: AsRef<OsStr>>(key: K, default: bool) -> bool {
-    std::env::var(key).map(is_truthy).unwrap_or(default)
-}

and second keeps the envvar, but resolves it during build:

diff --git a/candle-metal-kernels/src/kernel.rs b/candle-metal-kernels/src/kernel.rs
index f941e302..6c07cf0c 100644
--- a/candle-metal-kernels/src/kernel.rs
+++ b/candle-metal-kernels/src/kernel.rs
@@ -2,7 +2,6 @@ use crate::source::{
     AFFINE, BINARY, CAST, CONV, FILL, INDEXING, MLX_GEMM, MLX_SORT, QUANTIZED, RANDOM, REDUCE,
     SDPA, SORT, TERNARY, UNARY,
 };
-use crate::utils::get_env_bool;
 use crate::{
     ComputePipeline, ConstantValues, Device, Function, Library, MTLCompileOptions,
     MTLMathFloatingPointFunctions, MTLMathMode, MetalKernelError, Source,
@@ -182,7 +181,7 @@ fn get_compile_options() -> Retained<MTLCompileOptions> {
     let compile_options = MTLCompileOptions::new();
     //unsafe { compile_options.setEnableLogging(true) };
 
-    let fast_math_enabled = get_env_bool("CANDLE_METAL_ENABLE_FAST_MATH", true);
+    let fast_math_enabled = FAST_MATH_ENABLED;
     // Ref availability:
     // https://developer.apple.com/documentation/metal/mtlcompileoptions/mathmode
     if available!(macos = 15, ios = 18) {
@@ -200,3 +199,9 @@ fn get_compile_options() -> Retained<MTLCompileOptions> {
     }
     compile_options
 }
+
+const FAST_MATH_ENABLED: bool = match option_env!("CANDLE_METAL_ENABLE_FAST_MATH") {
+    Some("true") | Some("t") | Some("yes") | Some("y") | Some("1") => true,
+    Some(_) => false,
+    None => true,
+};
diff --git a/candle-metal-kernels/src/utils.rs b/candle-metal-kernels/src/utils.rs
index 1ad647d7..8cf4006a 100644
--- a/candle-metal-kernels/src/utils.rs
+++ b/candle-metal-kernels/src/utils.rs
@@ -1,6 +1,5 @@
 use crate::metal::{Buffer, CommandBuffer, ComputeCommandEncoder, ComputePipeline};
 use crate::MTLSize;
-use std::ffi::OsStr;
 use std::ops::Deref;
 use std::sync::{RwLockReadGuard, RwLockWriteGuard};
 
@@ -236,15 +235,3 @@ impl<'a, T> From<RwLockWriteGuard<'a, T>> for RwLockGuard<'a, T> {
     fn from(g: RwLockWriteGuard<'a, T>) -> Self {
         RwLockGuard::Write(g)
     }
-}
-
-fn is_truthy(s: String) -> bool {
-    match s.as_str() {
-        "true" | "t" | "yes" | "y" | "1" => true,
-        _ => false,
-    }
-}
-
-pub(crate) fn get_env_bool<K: AsRef<OsStr>>(key: K, default: bool) -> bool {
-    std::env::var(key).map(is_truthy).unwrap_or(default)
-}

@ivarflakstad
Copy link
Member Author

Great!

Wrt having the flag set at buildtime I disagree.

We don't have to supply the env var at all as we have a default.

We want to be able to toggle it easily at runtime so that we can verify the correctness of the model implementations on metal.
We don't want to rebuild everything just for this kind of verification/experimentation.

@ivarflakstad
Copy link
Member Author

Additionally, I have a sneaking suspicion that this flag will be removed and that we'll disable fast math in the future. I haven't noticed any meaningful performance improvements from having it enabled vs not.

@jacobgorm
Copy link

I am happy as long as it unbreaks the build and is on by default :). In my experience, fast-math makes a huge difference on CPU for fp32 models, but not sure how much it matters for fp16 on GPU, and probably not at all for quantized models. That said, I have never seen any adverse effects of having it on.

Wrt this being a build-time setting, I have always preferred to compile my shaders ahead of time, so there is naturally becomes part of the build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lost support for MacOS < 15.0 Sequoia

3 participants